skip to main content


Search for: All records

Creators/Authors contains: "Quick, Harrison"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    CDC WONDER is a web-based tool for the dissemination of epidemiologic data collected by the National Vital Statistics System. While CDC WONDER has built-in privacy protections, they do not satisfy formal privacy protections such as differential privacy and thus are susceptible to targeted attacks. Given the importance of making high-quality public health data publicly available while preserving the privacy of the underlying data subjects, we aim to improve the utility of a recently developed approach for generating Poisson-distributed, differentially private synthetic data by using publicly available information to truncate the range of the synthetic data. Specifically, we utilize county-level population information from the US Census Bureau and national death reports produced by the CDC to inform prior distributions on county-level death rates and infer reasonable ranges for Poisson-distributed, county-level death counts. In doing so, the requirements for satisfying differential privacy for a given privacy budget can be reduced by several orders of magnitude, thereby leading to substantial improvements in utility. To illustrate our proposed approach, we consider a dataset comprised of over 26,000 cancer-related deaths from the Commonwealth of Pennsylvania belonging to over 47,000 combinations of cause-of-death and demographic variables such as age, race, sex, and county-of-residence and demonstrate the proposed framework’s ability to preserve features such as geographic, urban/rural, and racial disparities present in the true data.

     
    more » « less
  2. Abstract

    The dissemination of synthetic data can be an effective means of making information from sensitive data publicly available with a reduced risk of disclosure. While mechanisms exist for synthesizing data that satisfy formal privacy guarantees, these mechanisms do not typically resemble the models an end-user might use to analyse the data. More recently, the use of methods from the disease mapping literature has been proposed to generate spatially referenced synthetic data with high utility but without formal privacy guarantees. The objective for this paper is to help bridge the gap between the disease mapping and the differential privacy literatures. In particular, we generalize an approach for generating differentially private synthetic data currently used by the US Census Bureau to the case of Poisson-distributed count data in a way that accommodates heterogeneity in population sizes and allows for the infusion of prior information regarding the underlying event rates. Following a pair of small simulation studies, we illustrate the utility of the synthetic data produced by this approach using publicly available, county-level heart disease-related death counts. This study demonstrates the benefits of the proposed approach’s flexibility with respect to heterogeneity in population sizes and event rates while motivating further research to improve its utility.

     
    more » « less
  3. Abstract Objectives

    Our objectives were to (i) determine correlations between measurements of THC and of BTEX-H, (ii) apply these linear relationships to predict BTEX-H from measured THC, (iii) use these correlations as informative priors in Bayesian analyses to estimate exposures.

    Methods

    We used a Bayesian left-censored bivariate framework for all 3 objectives. First, we modeled the relationships (i.e. correlations) between THC and each BTEX-H chemical for various overarching groups of measurements using linear regression to determine if correlations derived from linear relationships differed by various exposure determinants. We then used the same linear regression relationships to predict (or impute) BTEX-H measurements from THC when only THC measurements were available. Finally, we used the same linear relationships as priors for the final exposure models that used real and predicted data to develop exposure estimate statistics for each individual exposure group.

    Results

    Correlations between measurements of THC and each of the BTEX-H chemicals (n = 120 for each of BTEX, 36 for n-hexane) differed substantially by area of the Gulf of Mexico and by time period that reflected different oil-spill related exposure opportunities. The correlations generally exceeded 0.5. Use of regression relationships to impute missing data resulted in the addition of >23 000 n-hexane and 541 observations for each of BTEX. The relationships were then used as priors for the calculation of exposure statistics while accounting for censored measurement data.

    Conclusions

    Taking advantage of observed relationships between THC and BTEX-H allowed us to develop robust exposure estimates where a large amount of data were missing, strengthening our exposure estimation process for the epidemiologic study.

     
    more » « less
  4. null (Ed.)
    Abstract Background The 2010 Deepwater Horizon (DWH) oil spill involved thousands of workers and volunteers to mitigate the oil release and clean-up after the spill. Health concerns for these participants led to the initiation of a prospective epidemiological study (GuLF STUDY) to investigate potential adverse health outcomes associated with the oil spill response and clean-up (OSRC). Characterizing the chemical exposures of the OSRC workers was an essential component of the study. Workers on the four oil rig vessels mitigating the spill and located within a 1852 m (1 nautical mile) radius of the damaged wellhead [the Discoverer Enterprise (Enterprise), the Development Driller II (DDII), the Development Driller III (DDIII), and the Helix Q4000] had some of the greatest potential for chemical exposures. Objectives The aim of this paper is to characterize potential personal chemical exposures via the inhalation route for workers on those four rig vessels. Specifically, we presented our methodology and descriptive statistics of exposure estimates for total hydrocarbons (THCs), benzene, toluene, ethylbenzene, xylene, and n-hexane (BTEX-H) for various job groups to develop exposure groups for the GuLF STUDY cohort. Methods Using descriptive information associated with the measurements taken on various jobs on these rig vessels and with job titles from study participant responses to the study questionnaire, job groups [unique job/rig/time period (TP) combinations] were developed to describe groups of workers with the same or closely related job titles. A total of 500 job groups were considered for estimation using the available 8139 personal measurements. We used a univariate Bayesian model to analyze the THC measurements and a bivariate Bayesian regression framework to jointly model the measurements of THC and each of the BTEX-H chemicals separately, both models taking into account the many measurements that were below the analytic limit of detection. Results Highest THC exposures occurred in TP1a and TP1b, which was before the well was mechanically capped. The posterior medians of the arithmetic mean (AM) ranged from 0.11 ppm (‘Inside/Other’, TP1b, DDII; and ‘Driller’, TP3, DDII) to 14.67 ppm (‘Methanol Operations’, TP1b, Enterprise). There were statistical differences between the THC AMs by broad job groups, rigs, and time periods. The AMs for BTEX-H were generally about two to three orders of magnitude lower than the THC AMs, with benzene and ethylbenzene measurements being highly censored. Conclusions Our results add new insights to the limited literature on exposures associated with oil spill responses and support the current epidemiologic investigation of potential adverse health effects of the oil spill. 
    more » « less
  5. Summary

    When collecting geocoded confidential data with the intent to disseminate, agencies often resort to altering the geographies before making data publicly available. An alternative to releasing aggregated and/or perturbed data is to release synthetic data, where sensitive values are replaced with draws from models designed to capture distributional features in the data collected. The issues associated with spatially outlying observations in the data, however, have received relatively little attention. Our goal here is to shed light on this problem, to propose a solution—referred to as ‘differential smoothing’—and to illustrate our approach by using sale prices of homes in San Francisco.

     
    more » « less